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I.  INTRODUCTION 


Investigators  in  many  fields  are  often  confronted  with  research 
problems  in  which  a  large  number  of  factors  (i.e.,  independent  var¬ 
iables)  must  be  considered.  In  such  cases  the  first  step  in  experi¬ 
mentation  is  usually  the  identification  of  the  most  important  factors, 
so  that  future  research  may  be  concentrated  on  the  major  factors. 
Accordingly  we  often  want  to  conduct  an  efficient  preliminary  screening 
experiment  aimed  at  determining  the  subset  of  important  factors. 

One  resource-efficient  screening  strategy  is  two-stage  group 
screening.  In  this  method,  introduced  by  Watson  (1961),  the  individ¬ 
ual  factors  (each  at  two  levels)  are  partitioned  into  groups,  forming 
group  factors.  By  assigning  the  same  level  to  all  componenet  factors 
within  each  group,  the  group  factors  are  tested  as  if  they  were  single 
factors.  All  factors  within  groups  found  to  have  significant  effects 
are  then  tested  individually  in  a  second-stage  experiment. 

A  key  assumption  in  Watson's  development  of  group  screening  is 
that  the  directions  of  all  effects  are  known  or  can  be  correctly  assumed, 
a  priori.  With  this  assumption,  factor  levels  can  be  assigned  so  that 
all  effects  are  in  the  same  direction.  Thus,  there  is  no  chance 
of  cancellation  of  effects  (within  a  group).  This  assumption,  how¬ 
ever,  is  unlikely  to  hold  exactly  in  practice.  Consequently  one 
may  hesitate  to  use  a  group  screening  design  because  important  effects 
may  cancel  if  assumed  effect  directions  are  wrong. 

In  a  previous  paper,  we  [Mauro  and  Smith  (1982)]  examined  the 
extent  to  which  cancellation  affects  the  performance  of  two-stage 


group  screening  designs  when  the  response  is  observed  without  random 
error  (i.e.,  when  the  error  standard  deviation  o  is  equal  to  zero). 

In  the  present  paper  we  extend  this  work,  to  the  case  o  >  0.  As  part 
of  our  investigation  we  have  developed  a  computer-aided  search  routine 
to  select  an  optimal  (in  a  sense  to  be  defined  later)  group  screening 
plan.  As  in  our  earlier  paper,  we  use  the  multifactorial  designs  of 
Plackett  and  Burraan  (1946)  to  analyze  the  results  of  the  first  and 
second  stages. 
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II.  ASSUMPTIONS  AND  NOTATION 


Suppose  K  factors  are  to  be  screened  for  their  effects  on  the 
response.  For  detecting  the  factors  having  major  effects,  it  is 
usually  sufficient  to  assume  a  first-order  model: 


yi  =  80  +  /=10jXij  +  Ei  * 


(2.1) 


where  is  the  i— —  response,  8q  is  a  constant  term  common  to  every 
response,  8^  (j  _>  1)  is  the  linear  effect  of  the  j—  factor,  x^  =  +1 
is  the  level  of  the  j—  factor  in  the  i—  run,  and  is  the  i— 
error  term.  We  make  the  following  additional  assumptions: 

1.  k  1  (k  unknown)  of  the  K  factors  are  active  (i.e., 
have  a  true  effect)  and  (K-k)  are  inactive. 


2.  all  active  factors  have  the  same  absolute  effect, 

A  >  0,  that  is, 

rA,  if  factor  j  is  active 

is  I  -  j 

0,  if  factor  j  is  inactive, 

3.  the  error  terms  are  independent  and  normally 

distributed  with  mean  zero  and  unknown  variance  o2. 


We  let  8(i)  for  i=0 , 1 ,2 , . . . ,k  denote  the  effects  arrangement  in 
which  i  effects  equal  -A,  (k-i)  effects  equal  +A,  and  (K-k)  effects 
equal  0.  The  _8(0)  [or  _S(k)  ]  case,  therefore,  corresponds  to  the 
situation  where  all  active  effects  are  in  the  same  direction.  Further¬ 
more,  in  the  version  of  two-stage  group  screening  we  consider,  we 
assume  that  the  K  factors  are  partitioned  randomly  into  G  groups  of 
size  g;  if  K  is  not  a  multiple  of  g,  we  assume  that  the  group  sizes 
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are  taken  as  "evenly"  as  possible.  Concerning  our  assumptions,  we 
note  that  random  grouping  and  equal  absolute  effects  maximize  the 
chance  of  cancellation.  Thus,  with  regard  to  studying  the  cancell¬ 
ation  effect,  our  assumptions  define  "worst  case"  conditions. 

For  reasons  of  economy  and  to  avoid  design  saturation  (i.e., 
no  degrees  of  freedom  to  estimate  a),  we  employ  at  both  stages  of 
screening  the  smallest  Plackett-Burman  (PB)  design  that  has  at  least 
one  error  degree  of  freedom.  Since  PB  designs  are  only  available 
for  numbers  of  runs  that  are  multiples  of  four,  the  number  of  first- 
stage  runs  Nj  required  to  test  the  G  group  factors  will  therefore 
be  B(G+1)  where 

B(x)  =  x  +  4  -  x(mod  4).  (2.2) 

Similarly,  if  S  denotes  the  number  of  factors  that  reach  the  second 
stage,  then  the  number  of  second-stage  runs  will  be  B(S+1).  Thus, 
the  total  number  of  runs  R  required  by  both  stages  of  group  screening 
will  be  +  N2  =  B(G+1)  +  B(S+1).  We  note  that  because  S  is  random, 
so  is  R. 

Regarding  formal  significance  testing,  the  results  of  the  first 
and  second  stages  can  be  analyzed  by  the  usual  analysis  of  variance 
procedures  for  factorial  experiments.  We  denote  the  significance 
levels  of  the  (two-sided)  t  tests  performed  at  the  end  of  the  first 
and  second  stages  by  and  a respectively.  Our  version  of  two- 
stage  group  screening,  therefore,  is  completely  determined  by  g, 
ctj,  and  a^.  Accordingly,  we  will  denote  such  a  strategy  by 
GS(g,  ,  a^) .  In  the  next  section  we  show  how  the  quantities  g, 
oij*  and  a 2  affect  the  performance  of  the  GS(g,  a^,  c^)  strategy. 


III.  PERFORMANCE  EVALUATION 


We  can  define  three  separate  measures  of  performance.  These 


are: 


Power.  We  denote  by  A  the  number  of  active  factors  that 
are  detected  correctly,  and  we  define 

EA  =  100E(A) /k  (3.1) 

A 

as  a  percentage  measure  of  the  "power"  of  a  GS  strategy 
for  detecting  the  active  factors. 


Type  I  Error.  We  denote  by  U  the  number  of  inactive 
factors  that  are  declared  active  (important),  and  we  define 

Ey  =  100E (U) / (K-k)  (3.2) 

as  a  percentage  measure  of  Type  I  error  (i.e.,  declaring 
active  an  inactive  factor). 


Relative  Testing  Cost.  We  define  relative  testing  cost 

Er  =  100E(R) /B(K+1)  (3.3) 

as  the  ratio, expressed  as  a  percentage,  of  the  expected 
number  of  runs  required  by  a  GS  strategy  to  the  number  of 
runs  required  by  the  smallest  PB  design  for  K  factors  that 
has  at  least  one  error  degree  of  freedom. 

A  larger  value  of  E^,  a  smaller  value  of  E^,  or  a  smaller  value 

of  E  indicates  better  performance  on  the  average,  but  all  three  mea- 

sures  should  be  considered  in  assessing  and  comparing  the  performance 

of  GS(g,  Oj,  a y)  strategies.  In  general,  selecting  a  suitable  GS(g, 

u. ,  a„)  strategy  will  require  that  trade-offs  be  made  between  E, ,  E„, 
i  4  A  U 

and  Er.  In  many  ways  our  problem  is  like  the  testing  of  a  statistical 
hypothesis  in  which  we  want  the  sample  size  (relative  testing  cost) 
and  Type  I  error  to  be  small  but  the  power  to  be  large. 
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In  Che  Appendix  we  derive  the  expected  values  of  A,  l),  and  R  for 

any  K,  k,  j3(i),  g,  ,  a and  signal-to-noise  ratio  A/a  .  Using  these 

results  we  developed  a  computer  program  that  gives  the  performance 

properties  of  alternative  GS(g,  ot^,  ct^)  strategies.  To  illustrate  its 

use,  we  applied  our  program  to  two  case  studies,  which  we  will  refer 

to  as  Study  A  and  Study  B.  In  Study  A  we  evaluated  E  ,  E  ,  and  E 

A  U  K 

when  K=60,  k=8,  A/o=  1,3, 00  and  _8  =  j}(0),  j$(4)  for  g=3  and  6,  a  =.01, 

.05,  .10  and  o^.Ol,  .05,  .10  .  In  Study  B  we  evaluated  E^,  E^,  and 

E  when  K=240,  k=32.  A/a  =  1,  3,  00  and  8  =  8(0), 8(16)  for  the  same  g, 

R  —  —  — 

otj,  a2  combinations  considered  in  Study  A.  We  chose  to  consider  just 

the  j3(0)  and  j3([k/2])  cases  because  of  the  symmetry  between  the  j$(i) 

and  8(k-i)  arrangements.  Moreover,  the  probability  of  group-factor 

effect  cancellation  is  maximized  in  the  j3([k/2])  case.  The  results 

obtained  for  Studies  A  and  B  are  shown  in  Tables  1A  and  IB,  respectively. 

In  Tables  1A  and  IB  we  give  the  limiting  values  of  E„ ,  E  ,  and  E„ 

K  A  U 

as  A/a  -*■  00  (a>0) .  We  note  that  these  results  are  not  directly  com¬ 
parable  to  those  given  by  Mauro  and  Smith  (1982)  under  the  assumption 
that  o=0.  The  zero  and  nonzero  error  cases  are  fundamentally  different 
since  when  a=0  the  testing  process  is  totally  deterministic. 

It  is  intuitive  that  an  increase  in  a  or  will  increase  both 

E^  and  E^.  An  increase  in  will  also  increase  the  expected  number 
of  runs  made,  thus  the  relative  testing  cost  E^.  However,  E^  does 
not  depend  on  a^,  a  fact  that  will  be  important  to  us  in  later  discussion. 
The  actual  extent  of  these  movements  for  our  two  case  studies  can  be 


seen  from  Tables  1A  and  IB 
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Table  IB.  Performance  Results  Fcr  Study  B:  K=240,  k=32.  Values  in  Parentheses 
Were  Obtained  in  J?(16)  Case;  Values  Outside  Parentheses  Were  Obtained 
in  jj(0)  Case . 


We  can  further  add  that  as  -*  0,  the  probability  of  detecting 

any  group  effect  approaches  zero.  Thus,  in  this  case,  E^  ■*  0,  0, 

and  E  *  100  ,,  where  y  =  B(G+1 ) /B(K+1 ) .  As  a  -*■  1 ,  all  groups  will 

K  I 

be  found  to  have  a  significant  effect  with  probability  one,  in  which 
case  the  second  stage  simply  becomes  a  PB  experiment  for  all  K  factors 
in  B(K+1 )  runs;  thus,  -+  a,,.  Ea  -*•  TpB,  and  ER  -*•  100(1+y),  where 
is  the  corresponding  power  for  detecting  an  effect  of  magnitude  A 
in  the  associated  PB  design.  Corresponding  limits  as  +  0  or  1  are 
less  interesting. 

As  noted  in  Tables  1A  and  IB,  performance  values  enclosed  by 

parentheses  were  obtained  in  the  8([k/2])  case;  those  directly  preceding 

the  parentheses  are  corresponding  values  that  were  obtained  in  the  6(0) 

case.  Any  differences  between  these  values,  therefore,  are  due  to  the 

cancellation  effect.  Since  the  chance  of  canc  llation  is  zero  in 

the  6(0)  case  and  at  a  maximum  in  the  6([k/2])  case,  these  differences 

specify  the  maximum  effect  of  cancellation  on  performance. 

An  examination  of  the  results  shows  that  of  the  three  performance 

measures  considered,  E,  is  the  most  sensitive  to  the  cancellation 

A 

effect.  In  some  of  the  cases  considered  E[A[  3([k/2]))  is  70%  of 
E [ A  t  6(0)].  Further,  although  we  would  tend  to  use  fewer  runs  as 
the  chance  of  cancellation  increases  [equivalently,  as  i  increases 
from  0  to  [k/2]  in  S(i)],  this  apparent  advantage  is  offset  by  the 
fact  that  we  would  also  tend  to  detect  fewer  active  factors. 

In  Studies  A  and  B  the  proportion  of  factors  that  are  active 
(i.e.,  k/K)  is  the  same.  Mauro  and  Smith  (1982)  found  that  in  the 
deterministic  case  (o=0)  power  and  relative  testing  cost  are 
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essentially  a  function  of  k/K  and  3(i/k).  Inspection  of  Tables 
1A  and  IB  clearly  shows  that  this  result  does  not  hold  when  o  >  0. 

Thus,  in  this  case,  it  will  be  necessary  to  consider  each  (K,k) 
combination  separately. 

An  important  practical  consideration  in  the  use  of  a  GS  strategy 
is  the  number  of  error  degrees  of  freedom  (e.d.f.)  for  testing  group 
effects  in  the  first  stage.  For  testing  G  groups  in  a  PB  design 
having  B(G+1)  runs,  e.d.f.  =  B(G+1)  -(G+l),  and  thus,  1  <_  e.d.f.  <_  4. 

When  e.d.f.  =  1  the  efficiency  of  the  PB  design  can  be  extremely  poor. 

See,  for  example,  the  case  with  K=60  and  g=6  in  Table  1A.  In  such 
cases  a  more  reasonable  strategy  may  be  to  employ  a  larger  PB  design 
or  to  use  one  less  group  and  partition  the  factors  as  "evenly"  as 
possible.  Such  considerations  would  also  apply  in  the  second  stage 
where  e.d.f.  =  B(S+1)  -  (S+l).  At  this  stage,  however,  a  reasonable 
alternative  for  increasing  e.d.f.  may  be  to  combine  effects  into  a 
pooled  error  estimate. 

We  make  one  final  observation.  Tables  1A  and  IB  suggest  that 
Ey  is  directly  proportional  to  in  a  GS(g,  ,  a^)  strategy.  Indeed, 
using  (A. 2)  and  (A. 5)  in  the  Appendix,  we  see  that 

Ey  =  100*oi2*P  (an  inactive  factor  reaches  the  second  stage}  (3,4) 
We  will  make  use  of  this  fact  in  the  next  section. 
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IV.  SEARCH  ROUTINE 


To  select  a  GS  strategy,  the  experimenter  must  specify  the  group 
size  g  and  the  significance  levels  of  the  first  and  second  stage  tests, 
dj  and  In  general  there  are  no  obvious  choices  for  g,  a^,  and  o^.  In 

order  to  choose  the  best  strategy  for  a  particular  application,  trade¬ 
offs  will  need  to  be  made  between  ED,  E.,  and  E  .  In  this  section  we 

K  A  U 

present  a  computer-aided  search  routine  to  help  select  a  "good"  GS 

strategy.  The  search  program  is  written  in  standard  FORTRAN  and  is 

available  upon  request  from  the  authors. 

In  order  to  use  the  search  routine,  the  experimenter  must  first 

specify  a  maximum  tolerable  relative  testing  cost,  say  E*.  and  a  maximum 

R 

tolerable  Type  I  error,  say  E*.  Subject  to  E  <  E*  and  E  <  E*,  the 

U  K  —  K  U  ~  U 

search  algorithm  determines,  for  various  group  sizes,  the  values  of 

CXj  and  a2  that  maximize  power  (EA> .  From  the  program  output,  the  group 

size  which  gives  the  greatest  power  may  then  be  selected.  The  basic 

steps  in  the  search  algorithm  are  outlined  in  Table  2. 

Step  4  of  the  algorithm  makes  use  of  the  fact  the  E„  is  an  increasing 

K 

function  in  and  does  not  depend  on  ot2 .  The  «2  defined  by  Step  5  can 
be  quickly  determined  from  (3.4).  Further,  the  overall  logic  of  the  al¬ 
gorithm  is  based  on  the  premise  that  power  increases  as  relative  testing 
cost  increases. 

Table  3  containes  sample  computer  printout  from  the  search  routine 
for  the  case  K=60,  k=8,  A/a=2,  E*=50%,  and  E*=5%.  Asterisks  appearing 
in  the  printout  signify  a  case  (i.e.,  group  size)  in  which  the  value  of 
Er  at  a^=0  equals  or  exceeds  E*.  In  Table  3,  for  instance,  when  g=2 
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Step  1 . 
Step  2. 
Step  3. 
Step  4. 
Step  5. 

Step  6. 
Step  7. 
Step  8. 


Input  values  for  K,  k,  and  A/cs  . 

Input  maximum  tolerable  values  for  E  and  E  . 

Assume  £(0)  case  and  g=2. 

Determine  a,  so  that  ED  attains  maximum  allowable  value. 
1  K 

For  the  ct^  determined  in  Step  4,  determine  the  a„ 
that  maximizes  E^  subject  to  constraint  specified 
in  Step  2.  1 

Calculate  E.  and  E„  for  given  GS(g,a, ,a.)  strategy. 

A  U  1  Z 

Repeat  Steps  4,  5,  and  6  as  long  as  g  <  min(8,  K/2). 

Reset  g=2  and  repeat  Steps  4  through  7  for  £(fk/2]) 
case. 


Table  2.  Outline  of  Two-Stage  Group  Screening 
Search  Algorithm 
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Table  3.  Sample  Computer  Printout  From  Two-Stage  Group  Screening  Search 
Routine  When  K=60,  k=8,  and  A/o=2.  Maximum  E  Specified  As  5% 
And  Maximum  E  Specified  As  50%  (Equivalently,  E(R)=32  Runs). 
Note:  Power,  Type  I  Error,  and  Relative  Testing  Cost  Are 
Expressed  As  Percentages. 


the  first-stage  PB  experiment  requires  32  runs,  thus  leaving  no  runs 
available  for  the  second-stage  follow-up  experiment. 

The  search  routine  is  based  on  the  performance  results  derived  in 
the  Appendix,  which  are  only  applicable  when  K  is  a  multiple  of  g.  There¬ 
fore,  for  group  sizes  where  this  restriction  is  not  met,  the  program  re¬ 
defines  K  as  the  nearest  multiple  of  g  and  denotes  the  new  value  as  KSTAR. 
Performance  results  are  then  calculated  as  if  there  are  KSTAR  factors  to 
be  screened.  It  seems  reasonable  that  these  results  should  be  comparable 
to  the  true  performance  had  the  K  factors  been  partitioned  as  "evenly" 
as  possible. 

As  a  final  observation,  we  note  that  if  the  group  sizes  considered 
in  Table  3  are  ranked  according  to  their  corresponding  power,  the  ranking 
is  the  same  in  the  8(0)  and  0(4)  cases.  We  have  found  this  phenomenon  to 
be  generally  true  for  the  different  cases  we  have  looked  at.  Greater 
power,  of  course,  is  attained  in  the  j|(0)  case,  when  no  cancellation  can 
occur. 

The  search  routine  presented  in  this  section  provides  guidance  in 
using  and  selecting  a  satisfactory  GS  strategy.  The  routine  supplies  the 
user  with  quantitative  information  needed  to  determine  whether  two-stage 
group  screening  is  suitable  for  a  particular  application. 


-14- 


V.  SUMMARY  AND  REMARKS 


In  this  paper  we  examine  the  performance  characteristics  of  two- 
stage  group  screening  experiments,  extending  the  previous  work  of  Mauro 
and  Smith  (1982).  The  analysis  in  Section  III  indicates  the  extent  to 
which  the  choice  of  group  size  and  of  the  significance  levels  of  the 
first  and  second  stage  tests  affect  performance.  We  evaluate  perfor¬ 
mance  as  a  function  of  a  constant  signal-to-noise  ratio  for  all  active 
(i.e.,  nonzero)  factors,  assuming  random  grouping  and  a  first-order 
model. 

In  screening  experiments  an  experimenter  may  hesitate  to  use  group 
screening  because  important  effects  may  cancel  if  not  all  effect  di¬ 
rections  are  known.  A  key  feature  in  our  development  is  that  we  make 
no  presumption  concerning  effect  direction,  only  effect  magnitude.  In 
fact,  the  assumptions  of  random  grouping  and  equal  absolute  effects 
define  "worst  case"  conditions  with  regard  to  possible  cancellation  of 
effects. 

In  general,  the  efficacy  of  group  screening  in  a  given  application 
must  be  based  on  trade-offs  between  factor  classification  and  testing 
cost.  To  facilitate  this  process  we  have  developed  a  computer-aided 
search  routine  which  addresses  this  problem.  The  results  of  this  paper 
can  be  used  as  a  practical  guide  in  decisions  about  the  possible  use 
and  choice  of  a  two-stage  group  screening  strategy. 
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APPENDIX 


Part  1.  Introduction 


Without  loss  of  generality,  let  j)(i)  denote  the  case  in  which 

Bl=B2=***=Bi=  "A  ’  Bi+l=Bi+2=,"=Bk=  +A  ’  and  Bk+rBk+2=*",'BK‘£0* 
Suppose  that  K=gG  where  g  denotes  the  group  size  and  G  denotes  the 

number  of  groups.  We  define  A  and  U  as  in  (3.1)  and  (3.2).  We 

now  define 


R 


1J 


1, 


if  the  j —  factor  is  in  a  group  that 
shows  a  significant  effect  in  the 
first  stage 


.0, 


otherwise 


(  1,  if  the  j—  factor  shows  a  significant 
]  effect  in  the  second  stage 

1  0,  otherwise. 


In  a  GS(g,  a  ,  a2)  strategy,  the  j —  factor  is  declared  important 

only  if  both  R^  =  l  and  R^^l*  Accordingly,  we  define  VRljR2j  and 

observe  that  A  =  l  D.  and  U  =  Z  D..  The  number  of  factors  S  that 
j<k  3  j>k  3  K 

reach  the  second  stage  is  given  by  S  =  Z  R... 

j  =  l  13 


Because  of  symmetry  we  can  write 


E  [A  !_B_(  i)  ]  =  iE[D1|g(i))  +  (k-i)E[Di+1|B(i)] 

=  iElDj^i)]  +  (k-i)E[D1|B(k-i)].  (A.l) 


Similarly 

E[u|8(i)]  =  (K-k)E[Dk+1jg(i)J,  (A. 2) 
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and 


E(s|j3(i)  ]  =  iE[Ru|e(i)]  +  (k-i)E[Ru|6(k-i)] 

+  (K-k)E[R1>k+1|e(i)].  (A. 3) 

Further , 

ElDjIaU)]  -  PlRu=l|6(i)]P[R21=l|R11=l,jS(i)],  (A. 4) 

and 

E[Dk+1|g(i)]  -  p[R1>k+1=1|B(i)]cx2.  (A. 5) 

Thus,  to  evaluate  the  expectations  of  A,  U,  and  S  it  suffices  to 
evaluate  P[R^=1  |j$(i)  ]  for  j  =  l  and  j=k+l  and  evaluate  P[R2^  =  1  |R^  =  1 ,  3(1)] 
for  j  =  1 . 

Regarding  the  expected  total  runs, 

E (R)  =  B(G+1)  +  E[B(S+1) ]  .  (A. 6) 

Using  the  approximation  B(x)  s  x+2.5,  we  have 

E[R|B(D]  s  B(G+1)  +  3.5  +  E[S|B(1)].  (A. 7) 

Note  that  |  B(x)  -  (x+2.5)  |  <_  1.5. 

Part  2.  Derivation  of  the  Probability  that  an  Active  Factor  Reaches 
Second  Stage 

Without  loss  of  generality  suppose  that  factor  //I  is  placed  into 
group  //I  and  assume  that  l<i<k.  Now,  given  that  factor  //I  ( 8 ^  =  -A  ) 
is  placed  into  group  it  1,  there  are  ( i—  1 )  effects  of  -A,  (k-i)  effects 
of  +A.  and  (K-k)  zero  effects  left  to  be  distributed  into  groups. 

We  define 

y(n;6;a)  =  P{  jTn<6)  |_>  t  (n;a/2)}  (A. 8) 
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where  T^(6)  denotes  a  random  variable  having  a  noncentral  t  distribution 
with  n  degrees  of  freedom  and  noncentrality  parameter  6,  and  where 
t(n;a/2)  denotes  the  upper  100(l-a/2)  percentage  point  of  "Student's" 
t  distribution  with  n  degrees  of  freedom. 

It  is  not  difficult  to  see  that 

J  M 

P[R  =l|g(i)]  =  I  E  p(jJm)4'(f1;6(j.ra);a1).  (A. 9) 

j=0  m=0 


where  J  =  min(g-l,  i-1)  ,  M  =  min(g-l-j,  k-i) ,  f^  =  N^-G-l,  =  B(G+1), 

6(j,m)  =  viT  (m-j-l)A/cr,  and 


P(i.m) 


K-k 

g-l-j-m 


(A. 10) 


The  quantity  p(j,m)  defined  in  (A. 10)  is  the  probability  that  j  effects 
of  -A  and  m  effects  of  +A  fall  into  group  //I  along  with  factor  #1. 


The  ¥  quantity  appearing  in  (A. 9)  is  the  power  of  the  t-test  associated 
with  group  //I  given  that  (j+1)  effects  of  -A  ,  m  effects  of  +A  ,  and 
(g-l-j-m)  zero  effects  are  placed  in  group  #1. 


Part  3.  Derivation  of  the  Probability  that  an  Inactive  Factor  Reaches 
Second  Stage 

To  derive  P[Rj  ^+jal|_S(i)J  we  can  repeat  the  argument  of  Part  2 
for  the  (k+l)st  factor  (S^+j=0).  Doing  so,  we  obtain 

J*  M* 

P[R1  k+lsl^(i)1  *  1  1  P*( j »ni) T ( f^»S*(j,m);ot^)»  (A.ll) 

’  j=0  m=0 

where  J*  =  min(g-l,i),  M*  =  min(g-l-j,  k-i),  <5*(j,m)  =  (m-j)A/o,  and 
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p*(j  .11) 


i  \  I k-i 


3j\  - 


!  K-k-1 

1  g-l-j-m 


K-l  i 


(A. 12) 


Part  4.  Derivation  of  the  Probability  that  an  Active  Factor  that 
Reaches  Second  Stage  is  Declared  Important 


We  let  H  denote  the  number  of  groups  in  the  first  stage  that  show 
a  significant  effect.  Note  that  S=gH,  so  that  E(H)  =  E(S)/g.  The 
expected  value  of  S  can  be  computed  per  (A. 3),  (A. 9),  and  (A. 11). 

We  can  write 


P[Rn  =  l|Ru=l,6(i)  ] 


G 

Z  P[H=h|Rn  =  l,g(i)]P[R21=l|H=h,  Rn=l,g(i)] 
h=  1 

(A. 13) 


The  second  factor  within  the  summation  in  (A. 13)  can  be  evaluated  as 
P[R2I  =  ljH=h,  Rn=l,8(i))  =  •■?(f2(h);62(h);a2)  (A. 14) 

where  f 2 (h) =  B(hg+1)  -  (hg+1)  and  ^(h)  =  -[B(hg+1)]^  A/a.  The 

conditional  distribution  of  H  given  Rj^l  and  J3(i)  is  intractable, 
however.  The  authors  have  found  that  the  conditional  distribution  of 
H  required  in  (A. 13)  can  be  reasonably  approximated  as  Y+l  where  Y  is 
a  binomially  distributed  random  variable  with  parameters  (C—  1 )  and 
success  probability  .  =  E(H)/G  =  E(S)/K.  Thus 

G 

P[R01*1  |Rj  j*!  ,_g(i)  1  ~  1'  P (Y^h— 1 )  y ( f 2 (h) ; <$2 (h)  jc^)  (A. 15) 

h=  1 


where  Y  b(G-l  ;.•>). 
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